Skip to main content

Sequence Models

Sequence-specific deep-learning models live under CSharpNumerics.ML.Sequence. They keep the existing IModel contract by interpreting each Matrix row as a flattened (timesteps x features) sample.

Currently available models:

  • CNN1DClassifier in CSharpNumerics.ML.Sequence.Models.Classification
  • CNN1DRegressor in CSharpNumerics.ML.Sequence.Models.Regression
  • LSTMClassifier in CSharpNumerics.ML.Sequence.Models.Classification
  • LSTMRegressor in CSharpNumerics.ML.Sequence.Models.Regression
  • BiLSTMClassifier in CSharpNumerics.ML.Sequence.Models.Classification
  • BiLSTMRegressor in CSharpNumerics.ML.Sequence.Models.Regression
  • TCNClassifier in CSharpNumerics.ML.Sequence.Models.Classification
  • TCNRegressor in CSharpNumerics.ML.Sequence.Models.Regression

Current sequence infrastructure:

  • ISequenceModel in CSharpNumerics.ML.Sequence.Interfaces
  • ConvolutionPaddingMode in CSharpNumerics.ML.Sequence.Enums โ€” Valid, Same, Causal
  • Conv1DLayer in CSharpNumerics.ML.Sequence.Layers โ€” supports causal padding and dilation
  • MaxPool1DLayer in CSharpNumerics.ML.Sequence.Layers
  • GlobalAvgPool1DLayer in CSharpNumerics.ML.Sequence.Layers
  • FlattenLayer in CSharpNumerics.ML.Sequence.Layers
  • LSTMLayer in CSharpNumerics.ML.Sequence.Layers
  • BiLSTMLayer in CSharpNumerics.ML.Sequence.Layers
  • ActivationLayer in CSharpNumerics.ML.Sequence.Layers โ€” parameter-free pointwise activation
  • DropoutLayer in CSharpNumerics.ML.Sequence.Layers โ€” inverted dropout (train only)
  • BatchNorm1DLayer in CSharpNumerics.ML.Sequence.Layers โ€” channel-wise normalisation
  • ResidualBlock in CSharpNumerics.ML.Sequence.Layers โ€” TCN residual block (causal + dilated)
  • TCNBlock in CSharpNumerics.ML.Sequence.Layers โ€” exponentially dilated residual stack

๐ŸŒŠ CNN1D Architectureโ€‹

Default CNN1D architecture:

Conv1D -> GlobalAvgPool -> Dense(hidden) -> Dense(output)

Optional variants:

  • UseMaxPooling = true inserts MaxPool1DLayer after the convolution.
  • UseGlobalAveragePooling = false switches to FlattenLayer before the dense projection.

Shared CNN1D hyperparameters:

  • TimeSteps
  • Features
  • Filters
  • KernelSize
  • ConvStride
  • Padding (Same, Valid)
  • UseMaxPooling
  • PoolSize
  • PoolStride
  • UseGlobalAveragePooling
  • HiddenUnits
  • LearningRate
  • Epochs
  • BatchSize
  • Activation

Additional regression hyperparameters:

  • L2

๐Ÿ” LSTM Architectureโ€‹

Default LSTM architecture:

LSTMLayer(returnSequences=false) -> Dense(hidden) -> Dense(output)

The LSTM layer implements the standard four-gate equations (forget, input, output, cell candidate) with full BPTT and gradient clipping. Key features:

  • Forget gate bias initialized to 1.0 to reduce vanishing gradients
  • Configurable ClipNorm for gradient clipping (default: 5.0)
  • returnSequences=false outputs only the final hidden state

LSTM hyperparameters:

  • TimeSteps
  • Features
  • HiddenSize - LSTM hidden/cell state dimension
  • HiddenUnits - optional dense layer after LSTM
  • ClipNorm - max gradient norm (default: 5.0)
  • LearningRate
  • Epochs
  • BatchSize
  • Activation
  • L2

โ†”๏ธ Bi-LSTM Architectureโ€‹

Default Bi-LSTM architecture:

BiLSTMLayer(returnSequences=false) -> Dense(hidden) -> Dense(output)

The Bi-LSTM layer composes two LSTMLayer instances - one processing the input forwards, one backwards - and concatenates their hidden states per timestep so that output dimension = 2 x HiddenSize.

When returnSequences=false, the output is [h_fwd_T | h_bwd_1].

Bi-LSTM hyperparameters are identical to LSTM (same HiddenSize, ClipNorm, etc.). The dense layer automatically adapts to the 2 x HiddenSize input width.

โฑ๏ธ TCN Architectureโ€‹

A Temporal Convolutional Network stacks dilated causal convolutions inside residual blocks, giving an exponentially growing receptive field while preserving sequence length. Unlike an RNN it processes all timesteps in parallel, and unlike a plain CNN its causal padding guarantees no leakage from future timesteps.

Default TCN architecture:

TCNBlock -> GlobalAvgPool -> Dense(hidden) -> Dense(output)

Each TCNBlock is a stack of ResidualBlocks whose dilation doubles per level (1, 2, 4, 8, โ€ฆ). A residual block is:

Conv1D(causal, dilated) -> BatchNorm -> ReLU -> Dropout
-> Conv1D(causal, dilated) -> BatchNorm -> Dropout -> (+ skip) -> ReLU

The skip connection uses a 1ร—1 convolution when the channel count changes, otherwise an identity. The receptive field of an L-level block with kernel k is 1 + 2(k-1)ยท(2^L โˆ’ 1) timesteps โ€” e.g. 8 levels with k = 3 cover 1021 timesteps.

TCN hyperparameters:

  • TimeSteps
  • Features
  • Channels - channel width of every residual block
  • KernelSize
  • Levels - number of residual blocks (dilation doubles each level)
  • DropoutRate
  • HiddenUnits - optional dense layer after global pooling
  • Activation
  • LearningRate
  • Epochs
  • BatchSize
  • L2

The dilated/causal layers are also usable standalone for custom architectures:

using CSharpNumerics.ML.Sequence.Layers;
using CSharpNumerics.ML.Sequence.Enums;
using CSharpNumerics.ML.Enums;

// Causal, dilated convolution โ€” output[t] sees only inputs at or before t
var conv = new Conv1DLayer(
inputChannels: 1, filters: 8, kernelSize: 3, stride: 1,
padding: ConvolutionPaddingMode.Causal, activation: ActivationType.Linear, seed: 1, dilation: 4);
int rf = conv.ReceptiveField; // (3-1)*4 + 1 = 9

var tcn = new TCNBlock(inputChannels: 1, channels: 16, kernelSize: 3, levels: 8);
int reach = tcn.ReceptiveField; // 1021 timesteps

Example with SupervisedExperiment (CNN1D):

using CSharpNumerics.ML;
using CSharpNumerics.ML.Enums;
using CSharpNumerics.ML.Experiment;
using CSharpNumerics.ML.Sequence.Models.Classification;

var result = SupervisedExperiment
.For(X, y)
.WithGrid(new PipelineGrid()
.AddModel<CNN1DClassifier>(g => g
.Add("TimeSteps", 128)
.Add("Features", 1)
.Add("Filters", 8)
.Add("KernelSize", 5)
.Add("HiddenUnits", 16)
.Add("LearningRate", 0.01)
.Add("Epochs", 200)
.Add("BatchSize", 16)
.Add("Padding", CSharpNumerics.ML.Sequence.Enums.ConvolutionPaddingMode.Same)
.Add("Activation", ActivationType.ReLU)))
.WithCrossValidator(CrossValidatorConfig.KFold(folds: 5))
.Run();

Example with SupervisedExperiment (LSTM):

using CSharpNumerics.ML;
using CSharpNumerics.ML.Enums;
using CSharpNumerics.ML.Experiment;
using CSharpNumerics.ML.Sequence.Models.Classification;

var result = SupervisedExperiment
.For(X, y)
.WithGrid(new PipelineGrid()
.AddModel<LSTMClassifier>(g => g
.Add("TimeSteps", 128)
.Add("Features", 1)
.Add("HiddenSize", 32)
.Add("HiddenUnits", 16)
.Add("LearningRate", 0.001)
.Add("Epochs", 200)
.Add("BatchSize", 16)
.Add("ClipNorm", 5.0)
.Add("Activation", ActivationType.ReLU)))
.WithCrossValidator(CrossValidatorConfig.KFold(folds: 5))
.Run();

Example with SupervisedExperiment (TCN regressor):

using CSharpNumerics.ML;
using CSharpNumerics.ML.Enums;
using CSharpNumerics.ML.Experiment;
using CSharpNumerics.ML.Sequence.Models.Regression;

var result = SupervisedExperiment
.For(X, y)
.WithGrid(new PipelineGrid()
.AddModel<TCNRegressor>(g => g
.Add("TimeSteps", 128)
.Add("Features", 1)
.Add("Channels", 16)
.Add("KernelSize", 3)
.Add("Levels", 4) // dilations 1, 2, 4, 8
.Add("DropoutRate", 0.1)
.Add("HiddenUnits", 16)
.Add("LearningRate", 0.01)
.Add("Epochs", 200)
.Add("BatchSize", 16)))
.WithCrossValidator(CrossValidatorConfig.KFold(folds: 5))
.Run();

๐ŸชŸ TimeSeries Integration - SequenceDataHelperโ€‹

SequenceDataHelper bridges TimeSeries (from CSharpNumerics.Statistics.Data) to the sequence model pipeline by creating sliding-window samples.

using CSharpNumerics.ML.Sequence;
using CSharpNumerics.Statistics.Data;

// Load a light curve from CSV (columns: Time, Flux, Label)
var ts = TimeSeries.FromCsv("lightcurve.csv");

// Create windows of 128 timesteps, stride 1, using column 1 ("Label") as target
var (X, y) = SequenceDataHelper.CreateWindows(ts, windowSize: 128, labelColumnIndex: 1, stride: 1);
// X shape: [numWindows x 128] (1 feature: Flux)
// y shape: [numWindows] (label from last timestep in each window)

Overloads:

  • CreateWindows(TimeSeries, windowSize, labelColumnIndex, stride) - extracts features and labels from a TimeSeries, excluding the label column from features.
  • CreateWindows(double[][], double[], windowSize, stride) - works with raw column arrays when labels are computed separately.

๐Ÿ›ฐ๏ธ Exoplanet-Transit Detection Exampleโ€‹

Synthetic Kepler-like light curve -> windowed samples -> CNN1D classification:

using CSharpNumerics.ML;
using CSharpNumerics.ML.Enums;
using CSharpNumerics.ML.Experiment;
using CSharpNumerics.ML.Sequence;
using CSharpNumerics.ML.Sequence.Models.Classification;
using CSharpNumerics.Statistics.Data;

// 1. Build a TimeSeries with flux and transit labels
var ts = new TimeSeries(times, new[] { flux, labels }, new[] { "Flux", "Label" });

// 2. Window into samples
var (X, y) = SequenceDataHelper.CreateWindows(ts, windowSize: 20, labelColumnIndex: 1, stride: 5);

// 3. Train a CNN1DClassifier with grid search
var result = SupervisedExperiment
.For(X, y)
.WithGrid(new PipelineGrid()
.AddModel<CNN1DClassifier>(g => g
.Add("TimeSteps", 20)
.Add("Features", 1)
.Add("Filters", 8)
.Add("KernelSize", 5)
.Add("HiddenUnits", 8)
.Add("LearningRate", 0.02)
.Add("Epochs", 150)
.Add("BatchSize", 16)
.Add("Activation", ActivationType.ReLU)))
.WithCrossValidator(CrossValidatorConfig.KFold(folds: 3))
.Run();

// result.BestScore -> transit detection accuracy

๐Ÿงฉ Neural Network Building Blocksโ€‹

The neural-network stack now exposes reusable components for sequence-oriented architectures without changing the existing IModel contract. Reusable dense/activation orchestration remains in CSharpNumerics.ML.NeuralNetwork, while sequence-specific layers and models live under CSharpNumerics.ML.Sequence.

Available infrastructure:

  • Activations for reusable ReLU, Sigmoid, Tanh, Linear, and Softmax transforms
  • ILayer for modular forward/backward layer composition
  • DenseLayer for trainable fully connected sequence steps
  • SequentialModel for stacking layers with shared forward/backward orchestration

These types are the reusable foundation for both generic feedforward models and the sequence-specific components in CSharpNumerics.ML.Sequence.

Example:

using CSharpNumerics.ML.Enums;
using CSharpNumerics.ML.NeuralNetwork;
using CSharpNumerics.ML.NeuralNetwork.Layers;
using CSharpNumerics.ML.Sequence.Models.Classification;
using CSharpNumerics.Numerics.Objects;
using CSharpNumerics.Numerics.Optimization.SingleObjective;

var model = new SequentialModel(
new DenseLayer(4, 8, ActivationType.ReLU),
new DenseLayer(8, 1, ActivationType.Linear));

var inputSequence = new[]
{
new VectorN(new[] { 0.2, 0.4, 0.6, 0.8 })
};

VectorN prediction = model.ForwardSingle(inputSequence);
VectorN lossGradient = prediction - new VectorN(new[] { 1.0 });

model.BackwardSingle(lossGradient);
model.ApplyGradients(
new GradientDescent(learningRate: 0.01),
new GradientDescent(learningRate: 0.01),
batchSize: 1);

var classifier = new CNN1DClassifier
{
TimeSteps = 128,
Features = 1,
Filters = 8,
KernelSize = 5,
HiddenUnits = 16,
LearningRate = 0.01,
Epochs = 200
};